Gamera: A Python-based Toolkit for Structured Document Recognition

نویسندگان

  • Karl MacMillan
  • Ichiro Fujinaga
چکیده

This paper presents Gamera, a new toolkit for the creation of domain-specific structured document recognition applications by domain experts with limited programming experience. The goal of the Gamera system is to leverage the user’s knowledge of the target documents to create custom applications rather than attempting to meet the needs of diverse users with a monolithic application. The system allows a knowledgeable user to combine image processing and recognition tools in an intuitive, interactive, graphical scripting environment based on Python. The use of Python in Gamera creates a simple yet powerful and flexible programming environment for novice programmers. Additionally, the resulting applications are suitable for a large-scale digitization project because they can be run in a batch-processing mode and easily integrated into a digitization framework. Finally, the Python module system has been extended to allow the easy creation of plugins using Python or C++.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gamera: Optical music recognition in a new shell

An optical music recognition system has been completely overhauled and reformatted into a new framework called Gamera. The new open-source software is not only designed to recognize various music notations, including handwritten scores, but can be used to develop systems that can recognize many other structured documents. Gamera is intended to be used by domain experts with particular knowledge...

متن کامل

A Multiple-Choice Test Recognition System based on the Gamera Framework

This article describes JECT-OMR, a system that analyzes digital images representing scans of multiple-choice tests compiled by students. The system performs a structural analysis of the document in order to get the chosen answer for each question, and it also contains a bar-code decoder, used for the identification of additional information encoded in the document. JECT-OMR was implemented usin...

متن کامل

Using the Gamera Framework for Building a Lute Tablature Recognition System

In this article we describe an optical recognition system for historic lute tablature prints that we have built with the aid of the Gamera toolkit for document analysis and recognition. We give recognition rates for various historic sources and show that our system works quite well on printed tablature sources using movable types. For engraved and manuscript sources, we discuss some principal c...

متن کامل

The Gamera framework for building custom recognition systems

This paper describes the Gamera framework for building custom document recognition systems. This open-source system is designed to support the testand-refine development cycle: an important style for developing recognition systems that work with difficult historical documents, since the solutions are often non-obvious. This paper explains the overall architecture of the system, in addition to d...

متن کامل

Transkribus Python Toolkit

This paper introduces an open source Python toolkit for the Transkribus platform. One part of the toolkit offers a Python client for the Transkribus RESTful interface. The second part offers various Document Understanding tools. The open-source toolkit is freely available through GitHub. Keywords—Transkribus platform, RESTful client, Document Understanding, Conditional Random Fields, Sequential...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001